ImageGear's Recognition component enables you to build OCR applications for the Windows development environment. An application using the Recognition API can:
- Accept as input any image file format supported by ImageGear. This image data can be binary, gray, or color.
- Recognize an image.
- Save recognition data in either a simple text format, or one of document formats, such as MS Word or XML. The component supports the saving of multi-page documents.
- Select the language or languages of documents to be recognized. A total of 123 languages, including the following:
- Latin alphabet languages, such as English, Eastern and Western European languages, and Baltic languages
- Cyrillic alphabet languages, such as Russian
- Asian languages, such as Chinese, Japanese, and Korean
- Turkish
- Greek
Documents with multiple languages can be recognized with accuracy because the API allows the application to specify the set of languages for recognition.
- Enable end users to verify text during the recognition process.
- Increase recognition accuracy with built-in and user-defined dictionaries.
- Output confidence values for post-recognition processing.
- Automatically segment the page to correctly recognize text on pages with complex or irregular layouts, including tables and graphics as well.
- Allow the user to delineate zones of a document page and then specify treatment for those zones. This includes the ability to correct the OCR engine's automatic segmentation between the segmentation phase and the recognition phase.
- Process both text and graphics. The recognition software's ability to distinguish graphics from text can provide the basis of a compound document processing system.
- Automatically detect fax, dot matrix, and other degraded documents and compensate accordingly.
- Use the scalable voting architecture that provides developers with two pre-made voting interfaces (PLUS2W, PLUS3W) and direct access to OCR engines.